Data Processing

Show code
from pathlib import Path
import pandas as pd
import geopandas as gpd
import requests
import censusdata

BASE_DIR = Path(".").resolve()
DATA_DIR = BASE_DIR / "Data"
RAW_DIR = DATA_DIR / "raw"
PROCESSED_DIR = DATA_DIR / "processed"
PROCESSED_DIR.mkdir(parents=True, exist_ok=True)

1.1 Data Collection

We first set up a clear project directory structure using pathlib, defining base, raw, and processed data folders. We then load Philadelphia census block group polygons from the GeoJSON file obtained from OpenDataPhilly. Census block groups serve as our spatial unit of analysis because they provide an appropriate balance between demographic detail, geographic precision, and data availability for evaluating bicycle infrastructure equity in Philadelphia. We also inspect the number of block groups and the coordinate reference system (CRS) to confirm that the spatial units are suitable for further analysis.

Show code
bg_path = RAW_DIR / "Census_Block_Groups_2010.geojson"
block_groups = gpd.read_file(bg_path)

1.2 Load ACS 2023 5-Year Data via censusdata

To access socioeconomic variables at the block group level within Philadelphia County, we use the censusdata package to download 2023 ACS 5-year estimates. The selected variables include total population, median household income, zero-vehicle households, and bicycle commuters.

Because the ACS download uses a hierarchical geography index, we reconstruct a full 12-digit GEOID by defining a helper function make_geoid_full that concatenates state, county, tract, and block group codes from x.geo. This reconstructed GEOID is then used to merge the ACS dataset with the block group geometries using a left join. ACS error codes and placeholder values are replaced with NaN to ensure the dataset is clean and reliable for further analysis.

Show code
acs_vars = [
    "B01003_001E",
    "B19013_001E",
    "B25044_003E",
    "B08301_018E",
]

acs_2023 = censusdata.download(
    "acs5",
    2023,
    censusdata.censusgeo([
        ("state", "42"),
        ("county", "101"),
        ("block group", "*"),
    ]),
    acs_vars,
)
Show code
def make_geoid(x):
    state = x.geo[0][1]
    county = x.geo[1][1]
    tract = x.geo[2][1]
    blkgrp = x.geo[3][1]
    return state + county + tract + blkgrp

acs_2023["GEOID"] = acs_2023.index.to_series().apply(make_geoid)

if "GEOID10" in block_groups.columns:
    block_groups = block_groups.rename(columns={"GEOID10": "GEOID"})

block_groups_acs = block_groups.merge(
    acs_2023,
    on="GEOID",
    how="left",
).copy()

block_groups_acs = block_groups_acs.replace(
    [-666666666, -666666667, -222222222],
    pd.NA,
)
Show code
block_groups_acs = block_groups_acs.replace(
    [-666666666, -666666667, -222222222],
    pd.NA,
)

block_groups_acs[["B01003_001E", "B19013_001E", "B25044_003E", "B08301_018E"]].describe()

1.3 Load Indego Station Data

For Indego station locations, we access the real-time API endpoint to retrieve live station data. The returned GeoJSON is parsed into a GeoDataFrame with CRS EPSG:4326. This ensures that the station dataset is up to date and ready for spatial processing.

Show code
stations_url = "https://bts-status.bicycletransit.workers.dev/phl"
response = requests.get(stations_url)
response.raise_for_status()

data = response.json()
features = data["features"]

stations_gdf = gpd.GeoDataFrame.from_features(features, crs="EPSG:4326")
print("Number of stations:", len(stations_gdf))
stations_gdf.head()

1.4 Reproject Coordinate Systems

To perform accurate distance calculations, both census block groups and Indego stations are reprojected into a common projected CRS: EPSG:2272. This projection allows distance to be measured in feet and ensures consistency across all geospatial operations.

Show code
stations_url = "https://bts-status.bicycletransit.workers.dev/phl"
response = requests.get(stations_url)
response.raise_for_status()
data = response.json()

features = data["features"]
stations_gdf = gpd.GeoDataFrame.from_features(features, crs="EPSG:4326")

TARGET_CRS = "EPSG:2272"
bg_proj = block_groups_acs.to_crs(TARGET_CRS).copy()
stations_proj = stations_gdf.to_crs(TARGET_CRS).copy()

bg_clean_path = PROCESSED_DIR / "bg_clean_2272.gpkg"
stations_clean_path = PROCESSED_DIR / "stations_clean_2272.gpkg"

bg_proj.to_file(bg_clean_path, layer="bg", driver="GPKG")
stations_proj.to_file(stations_clean_path, layer="stations", driver="GPKG")

print("Saved:", bg_clean_path)
print("Saved:", stations_clean_path)

1.5 Interactive Block Group Map with Socioeconomic Data

We created an interactive map using Folium to visualize demographic characteristics at the census block group level. The map includes a hover tooltip that displays each block group’s total population, median household income, number of zero-vehicle households, and number of bicycle commuters. This allows users to explore socioeconomic patterns across the city and provides important context for interpreting later need and access metrics.

Show code
import folium
from branca.element import Template, MacroElement

block_groups_acs["pop_total"] = block_groups_acs["B01003_001E"]
block_groups_acs["median_income"] = block_groups_acs["B19013_001E"]
block_groups_acs["zero_vehicle_households"] = block_groups_acs["B25044_003E"]
block_groups_acs["bike_commuters"] = block_groups_acs["B08301_018E"]

bg_poly_wgs = block_groups_acs.to_crs(epsg=4326).copy()
bg_proj_cent = block_groups_acs.copy()
bg_proj_cent["centroid"] = bg_proj_cent.geometry.centroid
bg_proj_cent = bg_proj_cent.set_geometry("centroid")
bg_centroid_wgs = bg_proj_cent.to_crs(epsg=4326).copy()

center_lat = bg_centroid_wgs.geometry.y.mean()
center_lon = bg_centroid_wgs.geometry.x.mean()

m_bg_acs = folium.Map(
    location=[center_lat, center_lon],
    zoom_start=11,
    tiles="CartoDB Positron"
)

def style_bg_outline(feature):
    return {
        "fillColor": "#f7fbff",
        "color": "#cccccc",
        "weight": 0.4,
        "fillOpacity": 0.4,
    }

folium.GeoJson(
    bg_poly_wgs,
    style_function=style_bg_outline,
    tooltip=folium.GeoJsonTooltip(
        fields=[
            "GEOID",
            "pop_total",
            "median_income",
            "zero_vehicle_households",
            "bike_commuters",
        ],
        aliases=[
            "GEOID",
            "Total population",
            "Median income (USD)",
            "Zero-vehicle households",
            "Bike commuters",
        ],
        localize=True
    )
).add_to(m_bg_acs)

legend_html = """
{% macro html(this, kwargs) %}

<div style="
  position: fixed;
  bottom: 20px;
  left: 20px;
  z-index: 9999;
  background-color: white;
  padding: 12px 14px;
  border: 1px solid #ccc;
  border-radius: 6px;
  box-shadow: 0 0 10px rgba(0,0,0,0.15);
  font-size: 13px;
  line-height: 1.4;
">

<b>Census Block Groups & ACS</b><br>
Block group boundaries with key<br>
ACS variables available in the<br>
tooltip (population, income,<br>
zero-vehicle households, and<br>
bike commuters).

</div>

{% endmacro %}
"""

legend = MacroElement()
legend._template = Template(legend_html)
m_bg_acs.get_root().add_child(legend)

m_bg_acs
Make this Notebook Trusted to load map: File -> Trust Notebook

1.6 Exploratory Map of Active Indego Bike Stations

We also mapped the current active Indego bike stations to visualize their spatial distribution across Philadelphia. This map helps illustrate where stations are located within block groups and highlights their concentration in the central parts of the city. As shown, most active stations are clustered in central Philadelphia, with limited extensions into South Philadelphia. This pattern reflects historical funding constraints and previous boundaries of the service area.

Show code
import folium
from branca.element import Template, MacroElement

st_wgs = stations_gdf.to_crs(epsg=4326).copy()
bg_poly_wgs = block_groups_acs.to_crs(epsg=4326).copy()

center_lat = st_wgs.geometry.y.mean()
center_lon = st_wgs.geometry.x.mean()

m_stations = folium.Map(
    location=[center_lat, center_lon],
    zoom_start=12,
    tiles="CartoDB Positron"
)

def style_bg_gray(feature):
    return {
        "fillColor": "rgba(0,0,0,0)",
        "color": "#cccccc",
        "weight": 0.6,
        "fillOpacity": 0.0,
    }

folium.GeoJson(
    bg_poly_wgs,
    style_function=style_bg_gray,
    name="Block Groups"
).add_to(m_stations)

name_col = None
for c in ["name", "station_name", "station", "kiosk_id"]:
    if c in st_wgs.columns:
        name_col = c
        break

for _, row in st_wgs.iterrows():
    lat = row.geometry.y
    lon = row.geometry.x
    name_val = row[name_col] if name_col else f"Station {row.name}"

    popup_html = f"{name_col or 'Station'}: {name_val}"

    folium.CircleMarker(
        location=[lat, lon],
        radius=4,
        color="#005b96",
        fill=True,
        fill_color="#1f78b4",
        fill_opacity=0.95,
        popup=popup_html
    ).add_to(m_stations)

legend_html = """
{% macro html(this, kwargs) %}

<div style="
  position: fixed;
  bottom: 20px;
  left: 20px;
  z-index: 9999;
  background-color: white;
  padding: 12px 14px;
  border: 1px solid #ccc;
  border-radius: 6px;
  box-shadow: 0 0 10px rgba(0,0,0,0.15);
  font-size: 13px;
  line-height: 1.4;
">

<b>Existing Indego Stations</b><br>
Blue points show current station<br>
locations. Light gray boundaries<br>
represent census block groups<br>
used in this analysis.

</div>

{% endmacro %}
"""

legend = MacroElement()
legend._template = Template(legend_html)
m_stations.get_root().add_child(legend)

m_stations
Make this Notebook Trusted to load map: File -> Trust Notebook
Show code
bg_clean_path = PROCESSED_DIR / "bg_clean_2272.gpkg"
stations_clean_path = PROCESSED_DIR / "stations_clean_2272.gpkg"

bg_proj.to_file(bg_clean_path, layer="bg", driver="GPKG")
stations_proj.to_file(stations_clean_path, layer="stations", driver="GPKG")

print("Saved:", bg_clean_path)
print("Saved:", stations_clean_path)